mistyR can be used to analyze spatial transcriptomics data sets stored in SeuratObject with just a couple of functions. In this vignette we demonstrate how to build a user friendly workflow starting from data preprocessing, through running mistyR, to analysis of results, i.e., the spatial interactions between markers stored in alternative assays and specific locations.
In principle mistyR can be used with data extracted from a SeuratObject only. For the sake of completeness of the process of modeling with mistyR we demonstrate a complete workflow, including data preprocessing with Seurat. The functions provided in this notebook can be adapted to the user preference but the main objective is to exploit as much as possible the flexibility of workflow creation from mistyR and object manipulation from Seurat.
mistyR vignette
In this Rmarkdown document we want to assess the importance of upstream transcription factors of BCL6, a key regulator in Tfh cell diferentiation as detailed in the following papers:
- T Follicular Helper Cell Biology: A Decade of Discovery and Diseases
- Bcl6-Mediated Transcriptional Regulation of Follicular Helper T cells (Tfh)
# MISTy
library(mistyR)
library(future)
# Seurat
library(Seurat)
# data manipulation
library(Matrix)
library(tibble)
library(dplyr)
library(purrr)
# normalization
library(sctransform)
# resource
library(progeny)
library(dorothea)
# setup parallel execution
options(future.globals.maxSize = 1024^3, future.seed = 123)
future::plan(multisession)
set.seed(123)
source(here::here("misc/paths.R"))
source(here::here("utils/bin.R"))
source(here::here("utils/mistyR_bin.R"))
"{misty}/{plt_dir}" %>%
glue::glue() %>%
here::here() %>%
dir.create(path = .,
showWarnings = FALSE,
recursive = TRUE)
"{misty}/{robj_dir}" %>%
glue::glue() %>%
here::here() %>%
dir.create(path = .,
showWarnings = FALSE,
recursive = TRUE)
We exemplify usage using Seurat objects, however, MISTy is not dependent on this.
In our case, pathway activities are stored in the progeny assay and gene expression in the SCT assay
merged_se <- "{clust}/{robj_dir}/integrated_spatial.rds" %>%
glue::glue() %>%
here::here() %>%
readRDS(file = .)
Subset for 1 Visium sample
se_sub <- subset(merged_se, subset = gem_id == "esvq52_nluss5")
se_sub
## An object of class Seurat
## 26846 features across 3079 samples within 1 assay
## Active assay: Spatial (26846 features, 2000 variable features)
## 3 dimensional reductions calculated: pca, umap, harmony
se_sub@images <- se_sub@images[Seurat::Images(se_sub) == "esvq52_nluss5"]
Following the mistyR vignette
gene.expression <- Seurat::GetAssayData(se_sub, assay = "Spatial")
coverage <- rowSums(gene.expression > 0) / ncol(gene.expression)
slide.markers <- names(which(coverage >= 0.05))
se_sub <- Seurat::SCTransform(object = se_sub, assay = "Spatial", verbose = FALSE)
For this first approach we only select high confidence TF, we will also look at TF upstream to PRDM1 which plays an antagonistic role in Tfh differentiation and is repressed by BCL6.
tf_df <- dorothea::dorothea_hs %>%
dplyr::filter(confidence %in% c("A"))
tf_df %>% dplyr::filter(target == "BCL6")
bcl6_preprint <- tf_df %>% dplyr::filter(target == "BCL6") %>% dplyr::pull(tf)
bcl6_preprint <- c("BCL6", bcl6_preprint)
tf_df %>% dplyr::filter(target == "PRDM1")
bcl6_antagonist <- tf_df %>% dplyr::filter(target == "PRDM1") %>% dplyr::pull(tf)
In this example we will explain the expression of TF upstream of BCL6 in terms of three different views:
# Define assay for each view
view.assays <- list(
"main" = "SCT",
"para.preprint" = "SCT",
"para.antagonist" = "Spatial"
)
# Define features for each view
view.features <- list(
"main" = bcl6_preprint,
"para.preprint" = bcl6_preprint,
"para.antagonist" = bcl6_antagonist
)
# Define spatial context for each view
view.types <- list(
"main" = "intra",
"para.preprint" = "para",
"para.antagonist" = "para"
)
# Define additional parameters (l in the case of paraview)
view.params <- list(
"main" = NULL,
"para.preprint" = 5,
"para.antagonist" = 5
)
misty.out <- "bcl6_test"
Now that we have preprocessed the data and have decided on a question to analyze, we can create and run a mistyR workflow.
misty.results <- run_misty_seurat(
visium.slide = se_sub,
view.assays = view.assays,
view.features = view.features,
view.types = view.types,
view.params = view.params,
spot.ids = NULL, # Using the whole slide
out.alias = misty.out
) %>%
mistyR::collect_results()
## Generating paraview
## Warning in mistyR::clear_cache(view.data.init$misty.uniqueid): Cache folder for requested id doesn't exist.
## Generating paraview
## Warning in mistyR::clear_cache(view.data.init$misty.uniqueid): Cache folder for requested id doesn't exist.
## Training models
## Warning in ...furrr_fn(...): Negative performance detected and replaced with 0 for target BCL6
## Warning in ...furrr_fn(...): Negative performance detected and replaced with 0 for target FOSL2
## Warning in ...furrr_fn(...): Negative performance detected and replaced with 0 for target FOXO3
## Warning in ...furrr_fn(...): Negative performance detected and replaced with 0 for target FOXO4
## Warning in ...furrr_fn(...): Negative performance detected and replaced with 0 for target SPI1
## Warning in ...furrr_fn(...): Negative performance detected and replaced with 0 for target STAT3
## Warning in ...furrr_fn(...): Negative performance detected and replaced with 0 for target STAT5A
## Warning in ...furrr_fn(...): Negative performance detected and replaced with 0 for target STAT5B
## Warning in ...furrr_fn(...): Negative performance detected and replaced with 0 for target TP53
##
## Collecting improvements
##
## Collecting contributions
##
## Collecting importances
##
## Aggregating
MISTy gives answers to three general questions:
1. How much can the broader spatial context explain the expression of markers (in contrast to the intraview)?
This can be observed in the gain in R2 (or RMSE) of using the multiview model in contrast to the single main view model.
misty.results %>%
mistyR::plot_improvement_stats("gain.R2") %>%
mistyR::plot_improvement_stats("gain.RMSE")
In this example, BCL6 is a marker whose expression can be explained better by modeling the broader spatial context around each spot.
We can further inspect the significance of the gain in variance explained, by the assigned p-value of improvement based on cross-validation.
misty.results$improvements %>%
dplyr::filter(target == "BCL6") %>%
dplyr::arrange(value)
misty.results$improvements %>%
dplyr::filter(measure == "p.R2") %>%
dplyr::arrange(value)
In general, the significant gain in R2 can be interpreted as the following: “We can better explain the expression of marker X, when we consider additional views, other than the intrinsic view.”
2. How much do different view components contribute to explaining the expression?
misty.results %>% mistyR::plot_view_contributions()
# misty.results$contributions.stats %>% dplyr::filter(target == "BCL6")
misty.results$contributions.stats %>% dplyr::filter(target == "BCL6")
In the case of BCL6, we observe that around 60% of the contribution in the final model comes from the expression of other markers of TF of BCL6 from the paraview while 35% comes the intrinsic view.
3. What are the specific relations that can explain the contributions?
To explain the contributions, we can visualize the importances of markers coming from each view separately as predictors of the expression of the intrinsic markers of BCL6
First, the intrinsic importances of the hypoxia markers.
misty.results %>% mistyR::plot_interaction_heatmap(view = "intra")
These importances are associated to the relationship between markers in the same spot. Let’s pick the best predictor of BCL6 to confirm this:
misty.results$importances.aggregated[["intra"]] %>%
dplyr::select(Predictor, BCL6) %>%
dplyr::arrange(-BCL6)
Seurat::SpatialFeaturePlot(se_sub, features = c("STAT1", "TP53", "STAT3"))
Second, the paraview importances of the hypoxia markers.
misty.results %>% mistyR::plot_interaction_heatmap(view = "para.preprint_5")
misty.results$importances.aggregated[["para.preprint_5"]] %>%
dplyr::select(Predictor, BCL6) %>%
dplyr::arrange(-BCL6)
These importances are associated to the relationship between markers in the spot and markers in the neighborhood (controlled by our parameter l).
Seurat::SpatialFeaturePlot(se_sub, features = c("BCL6", "CTCF", "STAT1"))
As expected, the expression of PFKFB4 (the best predictor from this view) in the neighborhood of each spot allows to explain the expression of PGK1.
Finally, the paraview importances of the estrogen markers. We will inspect the best predictor in this view.
misty.results %>% mistyR::plot_interaction_heatmap(view = "para.antagonist_5")
misty.results$importances.aggregated[["para.antagonist_5"]] %>%
dplyr::select(Predictor, BCL6) %>%
dplyr::arrange(-BCL6)
Seurat::SpatialFeaturePlot(se_sub, features = c("BCL6", "E2F1", "TP53"))
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dorothea_1.2.2 progeny_1.12.0 sctransform_0.3.2 purrr_0.3.4 dplyr_1.0.6 tibble_3.1.2 Matrix_1.3-3 SeuratObject_4.0.1 Seurat_4.0.2 future_1.21.0 mistyR_1.0.2
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 colorspace_2.0-1 deldir_0.2-10 ellipsis_0.3.2 ggridges_0.5.3 rprojroot_2.0.2 rstudioapi_0.13 spatstat.data_2.1-0 farver_2.1.0 furrr_0.2.2 leiden_0.3.7 listenv_0.8.0 ggrepel_0.9.1 fansi_0.4.2 R.methodsS3_1.8.1 codetools_0.2-18 splines_4.0.4 knitr_1.33 rlist_0.4.6.1 polyclip_1.10-0 jsonlite_1.7.2 bcellViper_1.26.0 ica_1.0-2 cluster_2.1.1 R.oo_1.24.0 png_0.1-7 uwot_0.1.10 shiny_1.6.0 spatstat.sparse_2.0-0 readr_1.4.0 compiler_4.0.4 httr_1.4.2 assertthat_0.2.1 fastmap_1.1.0 lazyeval_0.2.2 cli_2.5.0 later_1.2.0 htmltools_0.5.1.1 tools_4.0.4 igraph_1.2.6 gtable_0.3.0 glue_1.4.2 RANN_2.6.1 reshape2_1.4.4 Rcpp_1.0.6 scattermore_0.7 jquerylib_0.1.4 vctrs_0.3.8 nlme_3.1-152 lmtest_0.9-38 xfun_0.23 stringr_1.4.0 globals_0.14.0 mime_0.10
## [55] miniUI_0.1.1.1 lifecycle_1.0.0 irlba_2.3.3 goftest_1.2-2 MASS_7.3-53.1 zoo_1.8-9 scales_1.1.1 spatstat.core_2.1-2 hms_1.1.0 promises_1.2.0.1 spatstat.utils_2.1-0 parallel_4.0.4 RColorBrewer_1.1-2 yaml_2.2.1 reticulate_1.20 pbapply_1.4-3 gridExtra_2.3 ggplot2_3.3.3 sass_0.4.0 rpart_4.1-15 stringi_1.6.2 highr_0.9 distances_0.1.8 filelock_1.0.2 rlang_0.4.11 pkgconfig_2.0.3 matrixStats_0.58.0 evaluate_0.14 lattice_0.20-41 ROCR_1.0-11 tensor_1.5 labeling_0.4.2 patchwork_1.1.1 htmlwidgets_1.5.3 cowplot_1.1.1 tidyselect_1.1.1 here_1.0.1 parallelly_1.25.0 RcppAnnoy_0.0.18 plyr_1.8.6 magrittr_2.0.1 R6_2.5.0 generics_0.1.0 DBI_1.1.1 pillar_1.6.1 mgcv_1.8-35 fitdistrplus_1.1-3 survival_3.2-10 abind_1.4-5 future.apply_1.7.0 crayon_1.4.1 KernSmooth_2.23-20 utf8_1.2.1 spatstat.geom_2.1-0
## [109] plotly_4.9.3 rmarkdown_2.8 grid_4.0.4 data.table_1.14.0 digest_0.6.27 xtable_1.8-4 tidyr_1.1.3 httpuv_1.6.1 R.utils_2.10.1 munsell_0.5.0 viridisLite_0.4.0 bslib_0.2.5.1